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Abstract 

In a two-stage repeated classical game of prisoners' dilemma the knowl- 
edge that both players will defect in the second stage makes the players 
to defect in the first stage as well. We find a quantum version of this 
repeated game where the players decide to cooperate in the first stage 
while knowing that both will defect in the second. 



1 Introduction 

The well known simultaneous-move bimatrix game of prisoners' dilemma (PD) 
has attracted early attention Q in the recent studies in quantum game theory. 
In classical game theory [|| a two-stage repeated version of this game consists of 
the two players playing the game twice, observing the outcome of the first play 
before the second play begins. The payoffs for the entire game are simply taken 
as the sum of the payoffs from the two stages. Generally a two-stage repeated 
game has more complex strategic structure than its one-stage counterpart and 
players' strategic choices in the second stage are affected by the outcome of 
their moves in the first stage. For the classical one-stage PD game the strategy 
of 'defection' by both the players is well known as a unique Nash Equilibrium 
(NE). In its two-stage version the same NE appears again at the second stage 
because the first stage payoffs are added as constants to the second stage. In fact 
in all of the finitely repeated versions of the PD game the strategy of 'defection' 
by both the players appears as unique NE at every stage M. 

Recent interesting and important study of the one-stage quantum PD game 
by Eisert, Wilkens, and Lewenstein jlj makes one to ask a question: what can 
be a possible role for quantum mechanics when the game is played twice? It 
appears that this role should be relevant to the new feature showing itself in the 
game i.e. the two-stages. A role for quantum mechanics exists if it inter-links 
the two stages of the game in some way of interest. Classically both the players 
'defect' at each stage and the strategic choices remain the same because of the 
uniqueness of the NE at each stage. In our search for the quantum role we found 
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useful to study the idea of subgame-perfect outcome (SGPO) in a two-stage 
repeated bimatrix game in its quantum form. For a two-stage repeated game 
the idea of a SGPO is the natural analog of the backwards-induction outcome 
(BIO) B studied in games of complete and perfect information. In a recent 
paper jpT we considered the BIO idea in a quantum form of duopoly game and 
showed how entanglement can give an outcome corresponding to static form of 
the duopoly even when the game is played dynamically. In the present paper we 
study the natural analogue of BIO for a two-stage repeated PD quantum game, 
i.e., the idea of SGPO in a situation that can be said to lie in quantum domain. 
We solve the two-stage PD quantum game in the spirit of backwards induction 
studied in ref. ||, but now the first step in working backwards from the end 
of the game involves solving a real game rather than solving a single-person 
optimization problem as done in ref. ||. Classically the idea of SGPO comes 
out as a stronger solution concept especially when multiple NE appear in a 
stage. Our motivation is the observation that a quantization scheme for the PD 
game is known where the NE in a stage does not remain unique, thus making 
relevant a consideration of the concept of SGPO in the two-stage game played 
in a quantum setting. For the purpose of completeness, we will first describe 
how SGPO works for the classical two-stage PD game. Afterwards, we quantize 
the game using a known scheme, and then, show how a SGPO can exist that 
is counter-intuitive compared to the classical SGPO for the two-stage repeated 
PD game. 

2 Two-stage games of complete but imperfect 
information 

Like the dynamic game of complete and perfect information — for example the 
Stackelberg duopoly analyzed in ref. || — the play in a two-stage game of com- 
plete but imperfect information proceeds in a sequence of two stages, with the 
moves in the first-stage observed before the next stage begins. The new feature 
is that within each stage there are now simultaneous moves. The simultaneity 
of moves within each stage means that information is imperfect in the game. A 
two-stage game of complete but imperfect information consists of the following 
steps [pj: 

1. Players A and B simultaneously choose actions p and q from feasible sets 
P and Q, respectively. 

2. Players A and B observe the outcome of the first stage, (p,q), and then 
simultaneously choose actions p\ and qi from feasible sets P and Q, re- 
spectively. 

3. Payoffs are Pi{p 1 q,pi,qi) for i = A, B. 

A usual approach to solve a game from this class uses the method of back- 
wards induction. In ref. [H the first step in working backwards involves solving 
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a single-person optimization problem. Now the first step involves solving the 
real simultaneous-move game between players A and B in the second stage, 
given the outcome from stage one. If the players A and B anticipate that their 
second-stage behavior will be given by (j)*(p, q),q*(p, qj), then the first-stage in- 
teraction between players A and B amounts to the following simultaneous-move 
game: 

1. Players A and B simultaneously choose actions p and q from feasible sets 
P and Q, respectively. 

2. Payoffs are P,(p, q,p*{p 7 q), q*(p, q)) for i = A, B. 

When (p* , q*) is the unique NE of this simultaneous-move game, the 
(p*,q*,p*(p,q),q*(p,q)) is known as the SGPO (2| of this two-stage game. 

This outcome is the natural analog of the BIO in games of complete and perfect 

information. 

3 Two-stage prisoner's dilemma 
3.1 Classical form 

We use a normal form of the PD game given by the following matrix jlj] 

Bob's strategy 
C D 

Alice's strategy % ( M gjj) (1) 

where C and D are for the strategies of 'cooperation' and 'defection' respectively. 
The players play this simultaneous-move game twice. The outcome of the first 
play is observed before the second stage begins. The payoff for the entire game 
is simply the sum of the payoffs from the two stages. It is a two-stage game of 
complete but imperfect information ||. Suppose p and q are the probabilities 
with which the pure strategy C is played by the players A and B, respectively, 
in the stage 1. Similarly, p\ and q\ are the probabilities with which the pure 
strategy C is played by the players A and B, respectively, in the stage 2. We 
write [Pai] c j and [Pbi] c/ as the payoffs to players A and B, respectively, in the 
stage 1; where the symbol cl is for "classical". These payoffs can be found from 
the matrix (|l|) as 

[PA\] cl = -pq + ±q~P+l, [P m ] cl = -pq + 4p-q+l (2) 
The NE conditions for this stage are 

[P A1 (p*,q*)-P A1 (p,q*)} cl >0, [P B1 {p*,q*)-P B1 (p*,q)} cl >0 (3) 
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giving p* = q* = (i.e. defection for both the players) as the unique NE in this 
stage. Likewise, in the second stage the payoffs to players A and B are written 
as [-P42] ci and [Pb2\ c i respectively, where 

[Pa2\ c i = -piqi + 4<7i -pi + I, [Pb2\ c1 = -piQi + 4pi - q% + 1 (4) 

and once again the strategy of defection, i.e. p* = q* = 0, comes out as a 
unique NE in the second stage. To compute the SGPO of this two-stage game, 
we analyze the first stage of this two-stage PD game by taking into account the 
fact that the outcome of the game remaining in the second stage will be the NE 
of that remaining game — namely, p\ = q* = 0. At this NE the payoffs for the 
second stage are 

[Pa 2 (0,0)] cZ = 1, [P B2 (0,0)] cJ = 1 (5) 

Thus, the players' first-stage interaction in the two-stage PD amounts to a one- 
shot game in which the payoff pair (0, 0) for the second stage is added to each 
first-stage payoff pair. Writing this observation as 

[PA(l + 2)] cl = [P A l+PA2(0,0)] cl = -pq + 4q-p + 2 
[Pb(i+2)] c1 = [P B i+P B2 (0,0)] cl = -pq + Ap-q + 2 (6) 

It has again (0, 0) as the unique NE. Therefore, the unique SGPO of the two- 
stage PD game is (0, 0) in the first stage, followed by (0, 0) in the second stage. 
The strategy of defection in both stages appears as the SGPO for two-stage 
classical PD game. 

We now see how it becomes possible — in a quantum form of this two-stage 
PD game — to achieve a SGPO in which the players decide to cooperate in the 
first stage while knowing that they will both defect in the second. The quantum 
form of the two-stage PD game is played using a system of four qubits. Players' 
moves are given by manipulation of these qubits by two unitary and Hermitian 
operators (identity and inversion operator) in Marinatto and Weber's scheme 
[|] to play a quantum form of a matrix game. 

3.2 Quantum form 

A quantum version of a two-stage game must have the corresponding classical 
two-stage game as a subset ||. A scheme where this requirement is satisfied 
via a control of the initial state is the Marinatto and Weber's idea of playing 
a quantum version of a matrix game jij. The scheme was proposed originally 
to play a quantum form of a one-stage bimatrix game of the battle of sexes. 
The fundamental idea can be extended to play a two-stage version of a bimatrix 
game. For example, the two-stage quantum version of the PD game starts by 
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making available a 4-qubit pure quantum state to the players. This state can 
be written as 

\i>ini)= c ijki\ijkl) where ^ \c ljH \ 2 = 1 (7) 

i,j,k, 1=1,2 i,j,k,l=l,2 

where k, and / are identifying symbols for four qubits. The upper and lower 
states of a qubit are 1 and 2 respectively and e^-jy are complex numbers. It is 
a quantum state in2(g)2®2<g)2 dimensional Hilbert space. We suppose the 
qubits i and j are manipulated by the players in the first stage of the game 
and, similarly, the qubits k and I are manipulated in the second stage. Let p ini 
denote the density matrix for the initial state (Q). Suppose during their moves 
in the first stage of the game, the players A and B apply the identity operator 
I on IV'ini) with probabilities p and q respectively. The inversion operator C is, 
then, applied || with probabilities (1 —p) and (1 — q) respectively. The players' 
action in the first stage changes p ini to 

P fin = PI 1 A ® lBP lr j\ ® 4 + p(l - q)I A ® C BPml l\ <8> Cj, + 

q(l - p)C A ® I B p im C\ ® 4 + (1 - - ?)Ca ® C fl p ini Ci ® C]j 

(8) 

We suppose that the actions of the players in this stage are simultaneous and 
they remember their moves (i.e. the numbers p and q) in the next stage also. 
In the second stage the players A and B apply the identity operator with the 
probabilities p\ and q\, respectively, on pf in . The inversion operator C is, then, 
applied with probabilities (1 — pi) and (1 — q\) on Pu n , respectively. Fig. 1 
shows the overall idea of playing the two-stage game. One notices that the 
moves or actions of the players in the two stages of the game are done on two 
different pairs of qubits. 

After the moves of the second stage the quantum state changes to 

Pffin = PiqilA®lBPf in I A ®lB+Pl(l-<ll)lA®CBPf m I A ®C B + 

- Pl )C A ® lBp fin C\ ® 4 + 
{l- Pl ){l- qi )C A ®CBP fin C\®C B (9) 

The state Pffi n is now ready for a measurement, giving payoffs for the two 
stages of the game. If classically the bimatrix game ([!]) is played at each stage, 
the possession of the following four payoff operators by the 'measuring agent' 
corresponds to a quantum version of the two-stage game: 
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Figure 1: Playing a two-stage quantum game of prisoner's dilemma. I and C 
are unitary and Hermitian operators. 



(■^0 oper 
{Pa} Q per 
} oper 
} oper 



= {3\llkl){Ukl\ + 5\21kl){21kl\ + \22kl){22kl\} 



fc,i=l,2 



= Yl {3|yll><<ill|+5|v21)<y21| + |ii22>(y22|} 



^ {3 \llkl) (llkl\ + 5 \12kl) (I2kl\ + \22kl) {22kl\} 



fc,i=l,2 



^ {3 Kjll) (ij 11| + 5 \ijl2) (yl2| + |tj22) (y22|} 



(10) 



The corresponding payoffs are, then, obtained as mean values of these operators 
[El. For example, Alice's payoff in stage 1 is 



\ P Ai] qu = Trace { [(P A ) oper ^ p ffin j 



(11) 



We consider a two-stage quantum PD game played with an initial state in the 

4 

form \tp ini ) = ci 1 1111) + c 2 |1122) + c 3 |2211) + c 4 |2222) with £ M = 1- For 

t=i 

this state the payoffs to the players A and B in the two stages are found as 
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The players' payoffs in the classical two-stage PD game of eqs. can now be 

recovered from the eq. ( |l2| ) by making the initial state unentangled and fixing 
| ci | 2 = 1. The classical game is, therefore, a subset of its quantum version. 

One now proceeds — in the spirit of backwards-induction — to find a NE in 
the second stage of the quantum game. Suppose (p*,q*) is a NE in the second 
stage, then 



[PA2(p* 1 ,ql)-PA2( P l,q* 1 )} qu >0, lPB2(pl,q!)-PB2( P l,qi)] qu >0 (13) 

With the players' payoffs of the two stages given by eq. (|l^), the Nash inequal- 
ities [l3| can be written as 

W-pi){-^ + 2(| C2 | 2 + | C4 | 2 )-(| C i| 2 + |c 3 | 2 )} > 

W-gi){-K + 2(| C2 | 2 + | C4 | 2 )-(| C i| 2 + | C 3l 2 )} > (14) 

and the strategy of defection by both the players, i.e. p\ — q\ = 0, becomes a 
NE in the second stage of the quantum game, if 

{2(| C2 | 2 + |c 4 | 2 )-(|c 1 | 2 + |c 3 | 2 )}<0 (15) 

Similar to the classical analysis, one then finds the players' payoffs when both 
decide to defect in the second stage: 

[P A 2(0M qu = [PB2(0M qu - 3(| C2 | 2 + | C4 | 2 ) + (| Cl | 2 + |c 3 | 2 ) (16) 

The classical payoffs -when both players defect- of the eq. (||) can be recovered 
from eq. ( |l6| ) when |ci| 2 = 1, i.e. the initial state becomes unentangled. 

Like the classical case, the players' first-stage interaction in the two-stage 
quantum PD amounts to a one-shot game in which the payoff pair 3 ( | C2 1 2 + 
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I C4 1 ) + ( | ci | + |c 3 | ) for the second stage is added to each first-stage payoff pair 
i.e. 

[PA(l+2)] qu = {P A l+PA2(0 1 0)} qu = \c 1 \ 2 (-pq + 4q-p + 2) + 

|c 2 | 2 (-pq + 4q-p + 4) + | C3 | 2 (—pq -3q + 2p + A) + 
\c 4 \ 2 (-pq-3q + 2p + 6) 
[PB(l+2)] qu = {P B l+PB2(0,0)} qu = \c 1 \ 2 (~pq + 4p-q + 2) + 

|c 2 | 2 (~pq + 4p-q + 4) + |c 3 | 2 (—pq - 3p + 2q + 4) + 
\c 4 \ 2 (-pq-3p + 2q + 6) (17) 

Now the strategy of cooperation (pi = g* = 1) becomes a NE for the first-stage 
interaction in this quantum game, if 

{2(| Cl | 2 + |c 2 | 2 )-(|c 3 | 2 + |c4| 2 )}<0 (18) 

The inequalities (|l5|) and (jlj) define the conditions to be satisfied when players 
will decide to cooperate in their first-stage interaction and both will defect in 
the next stage. These conditions can be rewritten as 

|ci| 2 + M 2 <i | C2 | 2 + M 2 <i (19) 

For example, at |ci| 2 = |c 2 | 2 = |c 4 | 2 = i and |c 3 | 2 = \ these conditions hold. 
Because for the classical game the inequalities (|T|) cannot hold together, it 
shows why classically it is not possible that players cooperate in the first stage 
knowing that they will both defect in the second. 

4 Discussion and conclusion 

Classical analysis tells that the repeated games differ from one-shot games be- 
cause players' current actions can depend on the past behavior of the other 
players. In a repeated bimatrix game the same matrix game is played repeat- 
edly, over a number of stages that represent the passing of time. The payoffs arc 
accumulated over time. The accumulation of information about the "history" of 
the game changes the structure of the game with time. With each new stage the 
information at the disposal of the players changes and, since strategies trans- 
form this information into actions, the players' strategic choices arc affected. 
If a game is repeated twice, the players' moves at the second stage depend on 
the outcome of the first stage. This situation becomes more and more complex 
as the number of stages increases, since the players can base their decisions 
on histories represented by sequences of actions and outcomes observed over 
increasing number of stages. 
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Recent interesting findings in quantum game theory motivate a study of 
repeated games in the new quantum settings, because an extensive as well as 
useful analysis of repeated games already exists in the literature of classical 
game theory. In present paper — to look for a quantum role in repeated games — 
we consider a quantum form of a well known bimatrix game called prisoners' 
dilemma (PD). The classical analysis of the PD game has been developed in 
many different formats, including its finitely and infinitely repeated versions. 
In the history of quantum games the PD game became a focus of an early and 
important study [jjj telling how to play a quantum form of a bimatrix game. 
We selected a quantum scheme to play this bimatrix game where the players 
actions or moves consist of selecting positive numbers in the range [0, 1], giving 
the probabilities with which they apply two quantum mechanical (unitary and 
Hermitian) operators on an initial 4-qubit pure quantum state 0. The players' 
actions in each stage are done on two different pairs of qubits. The classical 
two-stage PD game corresponds to an unentangled initial state and the classical 
SGPO consists of players defecting in both the stages. Our results show that a 
SGPO where the players go for cooperation in a stage is a non-classical feature 
that can be made to appear in quantum settings. The argument presented here 
is based on the assumption that all games, resulting from a play starting with 
a 4-qubit quantum state of the form of the cq. (Q), are 'quantum forms' of 
the classical two-stage game. This assumption originates from the fact that 
the classical game corresponds to a particular 4-qubit quantum state which 
is also unentangled. The assumption makes possible to translate the desired 
appearance of cooperation in a stage to certain conditions on the parameters 
of the initial state; giving a SGPO where players decide to cooperate in their 
first-stage interaction while knowing that they both will defect in the next stage. 

We are thankful to the anonymous referee who asked about the compelling 
reason to choose a 2 ® 2 eg) 2 <g) 2 dimensional Hilbert space instead of a 2 ® 
2 dimensional one. A 2®2 dimensional treatment of this problem, in the 
same quantization scheme, involves denominator terms in the expressions for 
payoff operators when these are obtained under the condition that classical 
game corresponds to an unentangled initial state. It then leads to many 'if-then' 
conditions before one gets finally the payoffs. On the contrary, a treatment in 
2 ® 2 ® 2 <g> 2 dimensions becomes much smoother. Also a study of the concept of 
SGPO in a two-stage repeated quantum game, then, becomes a logical extension 
of the backwards- induction procedure proposed in the ref . || . 

In conclusion, we found how cooperation in two-stage PD game can be 
achieved by quantum means. In infinitely repeated versions of the classical 
PD game it is established that cooperation can occur in every stage of a SGPO, 
even though the only NE in the stage game is defection [0. In two-stage PD 
game to get a SGPO where players cooperate in the first stage is a result with no 
classical analogue. We have also indicated a possible way to study the concept 
of SGPO in repeated quantum games. 
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